197 research outputs found
Estimation of intrafamilial DNA contamination in family trio genome sequencing using deviation from Mendelian inheritance
With the increasing number of sequencing projects involving families, quality control tools optimized for family genome sequencing are needed. However, accurately quantifying contamination in a DNA mixture is particularly difficult when genetically related family members are the sources. We developed TrioMix, a maximum likelihood estimation (MLE) framework based on Mendel\u27s law of inheritance, to quantify DNA mixture between family members in genome sequencing data of parent-offspring trios. TrioMix can accurately deconvolute any intrafamilial DNA contamination, including parent-offspring, sibling-sibling, parent-parent, and even multiple familial sources. In addition, TrioMix can be applied to detect genomic abnormalities that deviate from Mendelian inheritance patterns, such as uniparental disomy (UPD) and chimerism. A genome-wide depth and variant allele frequency plot generated by TrioMix facilitates tracing the origin of Mendelian inheritance deviations. We showed that TrioMix could accurately deconvolute genomes in both simulated and real data sets
The Cure: Making a game of gene selection for breast cancer survival prediction
Motivation: Molecular signatures for predicting breast cancer prognosis could
greatly improve care through personalization of treatment. Computational
analyses of genome-wide expression datasets have identified such signatures,
but these signatures leave much to be desired in terms of accuracy,
reproducibility and biological interpretability. Methods that take advantage of
structured prior knowledge (e.g. protein interaction networks) show promise in
helping to define better signatures but most knowledge remains unstructured.
Crowdsourcing via scientific discovery games is an emerging methodology that
has the potential to tap into human intelligence at scales and in modes
previously unheard of. Here, we developed and evaluated a game called The Cure
on the task of gene selection for breast cancer survival prediction. Our
central hypothesis was that knowledge linking expression patterns of specific
genes to breast cancer outcomes could be captured from game players. We
envisioned capturing knowledge both from the players prior experience and from
their ability to interpret text related to candidate genes presented to them in
the context of the game.
Results: Between its launch in Sept. 2012 and Sept. 2013, The Cure attracted
more than 1,000 registered players who collectively played nearly 10,000 games.
Gene sets assembled through aggregation of the collected data clearly
demonstrated the accumulation of relevant expert knowledge. In terms of
predictive accuracy, these gene sets provided comparable performance to gene
sets generated using other methods including those used in commercial tests.
The Cure is available at http://genegames.org/cure
Informatics for RNA sequencing: A web resource for analysis on the cloud
Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki
- …